Overview

Dataset statistics

Number of variables18
Number of observations2050638
Missing cells2661654
Missing cells (%)7.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory281.6 MiB
Average record size in memory144.0 B

Variable types

Numeric8
Categorical9
Unsupported1

Alerts

currency id has constant value "0.0" Constant
country name has a high cardinality: 98 distinct values High cardinality
locality name has a high cardinality: 617 distinct values High cardinality
market name has a high cardinality: 3235 distinct values High cardinality
commodity purchased has a high cardinality: 838 distinct values High cardinality
name of currency has a high cardinality: 84 distinct values High cardinality
unit of goods measurement has a high cardinality: 125 distinct values High cardinality
country id is highly correlated with locality idHigh correlation
locality id is highly correlated with country idHigh correlation
country id is highly correlated with locality idHigh correlation
locality id is highly correlated with country idHigh correlation
market type id is highly correlated with market name.1 and 1 other fieldsHigh correlation
market name.1 is highly correlated with market type id and 1 other fieldsHigh correlation
country name is highly correlated with currency id and 1 other fieldsHigh correlation
currency id is highly correlated with market type id and 3 other fieldsHigh correlation
name of currency is highly correlated with country name and 1 other fieldsHigh correlation
country id is highly correlated with country name and 1 other fieldsHigh correlation
country name is highly correlated with country id and 8 other fieldsHigh correlation
locality id is highly correlated with country name and 1 other fieldsHigh correlation
market id is highly correlated with country name and 3 other fieldsHigh correlation
commodity purchase id is highly correlated with country name and 2 other fieldsHigh correlation
name of currency is highly correlated with country id and 8 other fieldsHigh correlation
market type id is highly correlated with country name and 2 other fieldsHigh correlation
market name.1 is highly correlated with country name and 2 other fieldsHigh correlation
measurement id is highly correlated with country name and 1 other fieldsHigh correlation
year recorded is highly correlated with country name and 2 other fieldsHigh correlation
locality name has 611016 (29.8%) missing values Missing
mp_commoditysource has 2050638 (100.0%) missing values Missing
price paid is highly skewed (γ1 = 107.5510841) Skewed
mp_commoditysource is an unsupported type, check if it needs cleaning or further analysis Unsupported
locality id has 26051 (1.3%) zeros Zeros

Reproduction

Analysis started2022-04-12 14:43:57.053014
Analysis finished2022-04-12 14:45:22.593036
Duration1 minute and 25.54 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

country id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct98
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1004.063627
Minimum1
Maximum70001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 MiB
2022-04-12T15:45:22.641040image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile33
Q1105
median150
Q3205
95-th percentile270
Maximum70001
Range70000
Interquartile range (IQR)100

Descriptive statistics

Standard deviation7163.518858
Coefficient of variation (CV)7.134526806
Kurtosis77.12139446
Mean1004.063627
Median Absolute Deviation (MAD)55
Skewness8.747616706
Sum2058971027
Variance51316002.44
MonotonicityNot monotonic
2022-04-12T15:45:22.715204image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
205137746
 
6.7%
115137093
 
6.7%
238116588
 
5.7%
19682099
 
4.0%
15573843
 
3.6%
11672437
 
3.5%
13861188
 
3.0%
4360921
 
3.0%
9056971
 
2.8%
18154974
 
2.7%
Other values (88)1196778
58.4%
ValueCountFrequency (%)
115427
 
0.8%
41793
 
0.1%
81272
 
0.1%
12990
 
< 0.1%
1320600
1.0%
19125
 
< 0.1%
237758
 
0.4%
26444
 
< 0.1%
2939530
1.9%
31346
 
< 0.1%
ValueCountFrequency (%)
7000117746
 
0.9%
407652304
 
0.1%
407649890
 
0.5%
99922904
1.1%
27110957
 
0.5%
27042793
2.1%
26936806
1.8%
264275
 
< 0.1%
2636
 
< 0.1%
25746053
2.2%

country name
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct98
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.6 MiB
Rwanda
 
137746
Bassas da India
 
137093
Syrian Arab Republic
 
116588
Philippines
 
82099
Mali
 
73843
Other values (93)
1503269 

Length

Max length32
Median length7
Mean length10.15997168
Min length4

Characters and Unicode

Total characters20834424
Distinct characters53
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAfghanistan
2nd rowAfghanistan
3rd rowAfghanistan
4th rowAfghanistan
5th rowAfghanistan

Common Values

ValueCountFrequency (%)
Rwanda137746
 
6.7%
Bassas da India137093
 
6.7%
Syrian Arab Republic116588
 
5.7%
Philippines82099
 
4.0%
Mali73843
 
3.6%
Indonesia72437
 
3.5%
Kyrgyzstan61188
 
3.0%
Burundi60921
 
3.0%
Gambia56971
 
2.8%
Niger54974
 
2.7%
Other values (88)1196778
58.4%

Length

2022-04-12T15:45:22.794730image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
republic255964
 
8.1%
rwanda137746
 
4.4%
india137093
 
4.4%
bassas137093
 
4.4%
da137093
 
4.4%
of117266
 
3.7%
syrian116588
 
3.7%
arab116588
 
3.7%
philippines82099
 
2.6%
democratic76954
 
2.5%
Other values (108)1826222
58.1%

Most occurring characters

ValueCountFrequency (%)
a2996884
 
14.4%
i2108754
 
10.1%
n1607785
 
7.7%
e1352481
 
6.5%
1090548
 
5.2%
o871453
 
4.2%
r862256
 
4.1%
s844326
 
4.1%
d738654
 
3.5%
b695010
 
3.3%
Other values (43)7666273
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter16822475
80.7%
Uppercase Letter2860973
 
13.7%
Space Separator1090548
 
5.2%
Other Punctuation37790
 
0.2%
Dash Punctuation21678
 
0.1%
Open Punctuation480
 
< 0.1%
Close Punctuation480
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a2996884
17.8%
i2108754
12.5%
n1607785
 
9.6%
e1352481
 
8.0%
o871453
 
5.2%
r862256
 
5.1%
s844326
 
5.0%
d738654
 
4.4%
b695010
 
4.1%
l654687
 
3.9%
Other values (16)4090185
24.3%
Uppercase Letter
ValueCountFrequency (%)
R395087
13.8%
B318749
11.1%
S266269
9.3%
I243956
 
8.5%
M193957
 
6.8%
A170289
 
6.0%
C166444
 
5.8%
L154155
 
5.4%
P150996
 
5.3%
N139378
 
4.9%
Other values (12)661693
23.1%
Space Separator
ValueCountFrequency (%)
1090548
100.0%
Other Punctuation
ValueCountFrequency (%)
'37790
100.0%
Dash Punctuation
ValueCountFrequency (%)
-21678
100.0%
Open Punctuation
ValueCountFrequency (%)
(480
100.0%
Close Punctuation
ValueCountFrequency (%)
)480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin19683448
94.5%
Common1150976
 
5.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a2996884
15.2%
i2108754
 
10.7%
n1607785
 
8.2%
e1352481
 
6.9%
o871453
 
4.4%
r862256
 
4.4%
s844326
 
4.3%
d738654
 
3.8%
b695010
 
3.5%
l654687
 
3.3%
Other values (38)6951158
35.3%
Common
ValueCountFrequency (%)
1090548
94.7%
'37790
 
3.3%
-21678
 
1.9%
(480
 
< 0.1%
)480
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII20834424
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a2996884
 
14.4%
i2108754
 
10.1%
n1607785
 
7.7%
e1352481
 
6.5%
1090548
 
5.2%
o871453
 
4.2%
r862256
 
4.1%
s844326
 
4.1%
d738654
 
3.5%
b695010
 
3.3%
Other values (43)7666273
36.8%

locality id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct894
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26310.71243
Minimum0
Maximum900022
Zeros26051
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size15.6 MiB
2022-04-12T15:45:22.864810image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile618
Q11510
median2156
Q33433
95-th percentile67161
Maximum900022
Range900022
Interquartile range (IQR)1923

Descriptive statistics

Standard deviation115952.8831
Coefficient of variation (CV)4.407059802
Kurtosis51.08067286
Mean26310.71243
Median Absolute Deviation (MAD)694
Skewness7.176866525
Sum5.395374672 × 1010
Variance1.34450711 × 1010
MonotonicityNot monotonic
2022-04-12T15:45:22.936519image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2197134770
 
1.7%
2197231610
 
1.5%
2196930805
 
1.5%
2197330032
 
1.5%
026051
 
1.3%
224015671
 
0.8%
221614512
 
0.7%
284214051
 
0.7%
128513535
 
0.7%
283413476
 
0.7%
Other values (884)1826125
89.1%
ValueCountFrequency (%)
026051
1.3%
2721888
 
0.1%
273210
 
< 0.1%
274210
 
< 0.1%
275210
 
< 0.1%
276240
 
< 0.1%
277210
 
< 0.1%
278911
 
< 0.1%
279210
 
< 0.1%
280286
 
< 0.1%
ValueCountFrequency (%)
9000222026
0.1%
9000192018
0.1%
9000182023
0.1%
9000172022
0.1%
9000162023
0.1%
9000152021
0.1%
9000141935
0.1%
9000122016
0.1%
9000111531
0.1%
9000091982
0.1%

locality name
Categorical

HIGH CARDINALITY
MISSING

Distinct617
Distinct (%)< 0.1%
Missing611016
Missing (%)29.8%
Memory size15.6 MiB
North/Amajyaruguru
 
34770
South/Amajyepfo
 
31610
East/Iburasirazuba
 
30805
West/Iburengerazuba
 
30032
Yobe
 
15671
Other values (612)
1296734 

Length

Max length43
Median length8
Mean length10.20118337
Min length3

Characters and Unicode

Total characters14685848
Distinct characters62
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBadakhshan
2nd rowBadakhshan
3rd rowBadakhshan
4th rowBadakhshan
5th rowBadakhshan

Common Values

ValueCountFrequency (%)
North/Amajyaruguru34770
 
1.7%
South/Amajyepfo31610
 
1.5%
East/Iburasirazuba30805
 
1.5%
West/Iburengerazuba30032
 
1.5%
Yobe15671
 
0.8%
Borno14512
 
0.7%
Homs14051
 
0.7%
Northern13971
 
0.7%
Central River13535
 
0.7%
Aleppo13476
 
0.7%
Other values (607)1227189
59.8%
(Missing)611016
29.8%

Length

2022-04-12T15:45:23.018763image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
region121334
 
6.1%
central49070
 
2.5%
north/amajyaruguru34770
 
1.7%
south/amajyepfo31610
 
1.6%
east/iburasirazuba30805
 
1.5%
west/iburengerazuba30032
 
1.5%
river26883
 
1.3%
western23963
 
1.2%
northern23793
 
1.2%
southern22566
 
1.1%
Other values (683)1604639
80.3%

Most occurring characters

ValueCountFrequency (%)
a1838743
 
12.5%
o970129
 
6.6%
e947552
 
6.5%
r935903
 
6.4%
i851988
 
5.8%
n829999
 
5.7%
u744450
 
5.1%
t659266
 
4.5%
559843
 
3.8%
s486276
 
3.3%
Other values (52)5861699
39.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter11416522
77.7%
Uppercase Letter2277193
 
15.5%
Space Separator559843
 
3.8%
Other Punctuation149631
 
1.0%
Close Punctuation94836
 
0.6%
Open Punctuation94836
 
0.6%
Dash Punctuation60551
 
0.4%
Connector Punctuation32436
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1838743
16.1%
o970129
 
8.5%
e947552
 
8.3%
r935903
 
8.2%
i851988
 
7.5%
n829999
 
7.3%
u744450
 
6.5%
t659266
 
5.8%
s486276
 
4.3%
l416167
 
3.6%
Other values (18)2736049
24.0%
Uppercase Letter
ValueCountFrequency (%)
A219573
 
9.6%
C189159
 
8.3%
I179307
 
7.9%
R178809
 
7.9%
S170411
 
7.5%
N156315
 
6.9%
K154031
 
6.8%
M141234
 
6.2%
B128562
 
5.6%
D101222
 
4.4%
Other values (16)658570
28.9%
Other Punctuation
ValueCountFrequency (%)
/137746
92.1%
'9908
 
6.6%
.1977
 
1.3%
Space Separator
ValueCountFrequency (%)
559843
100.0%
Close Punctuation
ValueCountFrequency (%)
)94836
100.0%
Open Punctuation
ValueCountFrequency (%)
(94836
100.0%
Dash Punctuation
ValueCountFrequency (%)
-60551
100.0%
Connector Punctuation
ValueCountFrequency (%)
_32436
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin13693715
93.2%
Common992133
 
6.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1838743
 
13.4%
o970129
 
7.1%
e947552
 
6.9%
r935903
 
6.8%
i851988
 
6.2%
n829999
 
6.1%
u744450
 
5.4%
t659266
 
4.8%
s486276
 
3.6%
l416167
 
3.0%
Other values (44)5013242
36.6%
Common
ValueCountFrequency (%)
559843
56.4%
/137746
 
13.9%
)94836
 
9.6%
(94836
 
9.6%
-60551
 
6.1%
_32436
 
3.3%
'9908
 
1.0%
.1977
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII14678368
99.9%
None7480
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1838743
 
12.5%
o970129
 
6.6%
e947552
 
6.5%
r935903
 
6.4%
i851988
 
5.8%
n829999
 
5.7%
u744450
 
5.1%
t659266
 
4.5%
559843
 
3.8%
s486276
 
3.3%
Other values (50)5854219
39.9%
None
ValueCountFrequency (%)
é6604
88.3%
ï876
 
11.7%

market id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct3266
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1591.206603
Minimum80
Maximum6083
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 MiB
2022-04-12T15:45:23.095373image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum80
5-th percentile180
Q1644
median1441
Q32331
95-th percentile4298
Maximum6083
Range6003
Interquartile range (IQR)1687

Descriptive statistics

Standard deviation1181.314129
Coefficient of variation (CV)0.7424014749
Kurtosis0.5968683393
Mean1591.206603
Median Absolute Deviation (MAD)838
Skewness0.9625167145
Sum3262988726
Variance1395503.071
MonotonicityNot monotonic
2022-04-12T15:45:23.167532image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8405798
 
0.3%
3055270
 
0.3%
3025270
 
0.3%
3035241
 
0.3%
3065224
 
0.3%
3044596
 
0.2%
6714584
 
0.2%
8424583
 
0.2%
6724568
 
0.2%
6804345
 
0.2%
Other values (3256)2001159
97.6%
ValueCountFrequency (%)
80603
< 0.1%
81579
< 0.1%
82550
< 0.1%
83511
< 0.1%
84589
< 0.1%
85552
< 0.1%
86537
< 0.1%
87437
< 0.1%
88525
< 0.1%
89587
< 0.1%
ValueCountFrequency (%)
608321
 
< 0.1%
608221
 
< 0.1%
608121
 
< 0.1%
608042
< 0.1%
60797
 
< 0.1%
60787
 
< 0.1%
60777
 
< 0.1%
601157
< 0.1%
601060
< 0.1%
600963
< 0.1%

market name
Categorical

HIGH CARDINALITY

Distinct3235
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 MiB
National Average
 
19748
Bogota
 
5798
Dushanbe
 
5270
Khujand
 
5270
Gharm
 
5241
Other values (3230)
2009311 

Length

Max length42
Median length8
Mean length8.593822996
Min length2

Characters and Unicode

Total characters17622820
Distinct characters75
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowFayzabad
2nd rowFayzabad
3rd rowFayzabad
4th rowFayzabad
5th rowFayzabad

Common Values

ValueCountFrequency (%)
National Average19748
 
1.0%
Bogota5798
 
0.3%
Dushanbe5270
 
0.3%
Khujand5270
 
0.3%
Gharm5241
 
0.3%
Bokhtar5224
 
0.3%
Khorog4596
 
0.2%
Bishkek4584
 
0.2%
Medellin4583
 
0.2%
Osh4568
 
0.2%
Other values (3225)1985756
96.8%

Length

2022-04-12T15:45:23.253079image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pasar70774
 
2.7%
city60163
 
2.3%
region44211
 
1.7%
al30422
 
1.1%
average21195
 
0.8%
national21195
 
0.8%
town13600
 
0.5%
commune11141
 
0.4%
kota10818
 
0.4%
santa9076
 
0.3%
Other values (3513)2353069
88.9%

Most occurring characters

ValueCountFrequency (%)
a2858930
16.2%
i1123262
 
6.4%
o1081994
 
6.1%
n1065451
 
6.0%
e956717
 
5.4%
r923184
 
5.2%
u828083
 
4.7%
595026
 
3.4%
l565982
 
3.2%
t535595
 
3.0%
Other values (65)7088596
40.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter14094635
80.0%
Uppercase Letter2693395
 
15.3%
Space Separator595026
 
3.4%
Dash Punctuation80950
 
0.5%
Close Punctuation57251
 
0.3%
Open Punctuation57251
 
0.3%
Other Punctuation38645
 
0.2%
Decimal Number5318
 
< 0.1%
Connector Punctuation349
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a2858930
20.3%
i1123262
 
8.0%
o1081994
 
7.7%
n1065451
 
7.6%
e956717
 
6.8%
r923184
 
6.5%
u828083
 
5.9%
l565982
 
4.0%
t535595
 
3.8%
g495695
 
3.5%
Other values (22)3659742
26.0%
Uppercase Letter
ValueCountFrequency (%)
M278089
 
10.3%
B275889
 
10.2%
K271571
 
10.1%
A207147
 
7.7%
S191038
 
7.1%
C183785
 
6.8%
P164790
 
6.1%
N143427
 
5.3%
T143101
 
5.3%
G128355
 
4.8%
Other values (17)706203
26.2%
Decimal Number
ValueCountFrequency (%)
21295
24.4%
5866
16.3%
8774
14.6%
0690
13.0%
3672
12.6%
4487
 
9.2%
1485
 
9.1%
649
 
0.9%
Other Punctuation
ValueCountFrequency (%)
'23032
59.6%
.10008
25.9%
/5605
 
14.5%
Space Separator
ValueCountFrequency (%)
595026
100.0%
Dash Punctuation
ValueCountFrequency (%)
-80950
100.0%
Close Punctuation
ValueCountFrequency (%)
)57251
100.0%
Open Punctuation
ValueCountFrequency (%)
(57251
100.0%
Connector Punctuation
ValueCountFrequency (%)
_349
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin16788030
95.3%
Common834790
 
4.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
a2858930
17.0%
i1123262
 
6.7%
o1081994
 
6.4%
n1065451
 
6.3%
e956717
 
5.7%
r923184
 
5.5%
u828083
 
4.9%
l565982
 
3.4%
t535595
 
3.2%
g495695
 
3.0%
Other values (49)6353137
37.8%
Common
ValueCountFrequency (%)
595026
71.3%
-80950
 
9.7%
)57251
 
6.9%
(57251
 
6.9%
'23032
 
2.8%
.10008
 
1.2%
/5605
 
0.7%
21295
 
0.2%
5866
 
0.1%
8774
 
0.1%
Other values (6)2732
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII17559234
99.6%
None63586
 
0.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a2858930
16.3%
i1123262
 
6.4%
o1081994
 
6.2%
n1065451
 
6.1%
e956717
 
5.4%
r923184
 
5.3%
u828083
 
4.7%
595026
 
3.4%
l565982
 
3.2%
t535595
 
3.1%
Other values (58)7025010
40.0%
None
ValueCountFrequency (%)
é41441
65.2%
è14669
 
23.1%
ó3910
 
6.1%
ï2075
 
3.3%
â837
 
1.3%
á347
 
0.5%
É307
 
0.5%

commodity purchase id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct636
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean220.1166403
Minimum50
Maximum893
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 MiB
2022-04-12T15:45:23.331757image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile52
Q173
median141
Q3303
95-th percentile680
Maximum893
Range843
Interquartile range (IQR)230

Descriptive statistics

Standard deviation193.8962677
Coefficient of variation (CV)0.8808796437
Kurtosis1.976363931
Mean220.1166403
Median Absolute Deviation (MAD)77
Skewness1.558552774
Sum451379547
Variance37595.76261
MonotonicityNot monotonic
2022-04-12T15:45:23.400703image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7363025
 
3.1%
6459379
 
2.9%
5159311
 
2.9%
6753356
 
2.6%
6550167
 
2.4%
5248675
 
2.4%
5848564
 
2.4%
9746746
 
2.3%
7141692
 
2.0%
11432430
 
1.6%
Other values (626)1547293
75.5%
ValueCountFrequency (%)
5018905
 
0.9%
5159311
2.9%
5248675
2.4%
545856
 
0.3%
5513593
 
0.7%
568002
 
0.4%
571370
 
0.1%
5848564
2.4%
605752
 
0.3%
618013
 
0.4%
ValueCountFrequency (%)
8933113
0.2%
887608
 
< 0.1%
88652
 
< 0.1%
8851037
 
0.1%
884592
 
< 0.1%
883820
 
< 0.1%
882927
 
< 0.1%
881929
 
< 0.1%
880934
 
< 0.1%
879934
 
< 0.1%

commodity purchased
Categorical

HIGH CARDINALITY

Distinct838
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.6 MiB
Millet - Retail
 
55898
Rice (imported) - Retail
 
53601
Maize - Retail
 
48596
Sorghum - Retail
 
46507
Wheat flour - Retail
 
46360
Other values (833)
1799676 

Length

Max length55
Median length21
Mean length21.92605618
Min length12

Characters and Unicode

Total characters44962404
Distinct characters65
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowBread - Retail
2nd rowBread - Retail
3rd rowBread - Retail
4th rowBread - Retail
5th rowBread - Retail

Common Values

ValueCountFrequency (%)
Millet - Retail55898
 
2.7%
Rice (imported) - Retail53601
 
2.6%
Maize - Retail48596
 
2.4%
Sorghum - Retail46507
 
2.3%
Wheat flour - Retail46360
 
2.3%
Sugar - Retail46082
 
2.2%
Maize (white) - Retail41717
 
2.0%
Rice - Retail40290
 
2.0%
Rice (local) - Retail37815
 
1.8%
Tomatoes - Retail31364
 
1.5%
Other values (828)1602408
78.1%

Length

2022-04-12T15:45:23.476639image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2050638
25.8%
retail1878421
23.7%
rice250669
 
3.2%
maize175816
 
2.2%
wholesale171305
 
2.2%
white120355
 
1.5%
meat118460
 
1.5%
oil117255
 
1.5%
beans109130
 
1.4%
flour104270
 
1.3%
Other values (479)2840906
35.8%

Most occurring characters

ValueCountFrequency (%)
5887517
13.1%
e4926422
 
11.0%
a4098395
 
9.1%
l3673779
 
8.2%
i3570863
 
7.9%
t3214451
 
7.1%
R2129741
 
4.7%
-2112496
 
4.7%
o1399825
 
3.1%
s1361732
 
3.0%
Other values (55)12587183
28.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter30219451
67.2%
Space Separator5887517
 
13.1%
Uppercase Letter4141199
 
9.2%
Dash Punctuation2112496
 
4.7%
Open Punctuation1168761
 
2.6%
Close Punctuation1168761
 
2.6%
Other Punctuation257618
 
0.6%
Decimal Number6591
 
< 0.1%
Math Symbol10
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e4926422
16.3%
a4098395
13.6%
l3673779
12.2%
i3570863
11.8%
t3214451
10.6%
o1399825
 
4.6%
s1361732
 
4.5%
r1140253
 
3.8%
n885881
 
2.9%
h835541
 
2.8%
Other values (16)5112309
16.9%
Uppercase Letter
ValueCountFrequency (%)
R2129741
51.4%
M401078
 
9.7%
W293377
 
7.1%
S234389
 
5.7%
B184606
 
4.5%
O173190
 
4.2%
C160938
 
3.9%
F120663
 
2.9%
P110426
 
2.7%
T71282
 
1.7%
Other values (15)261509
 
6.3%
Decimal Number
ValueCountFrequency (%)
91307
19.8%
51307
19.8%
01182
17.9%
81024
15.5%
11015
15.4%
2756
11.5%
Other Punctuation
ValueCountFrequency (%)
,244443
94.9%
'7764
 
3.0%
/5411
 
2.1%
Space Separator
ValueCountFrequency (%)
5887517
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2112496
100.0%
Open Punctuation
ValueCountFrequency (%)
(1168761
100.0%
Close Punctuation
ValueCountFrequency (%)
)1168761
100.0%
Math Symbol
ValueCountFrequency (%)
+10
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin34360650
76.4%
Common10601754
 
23.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e4926422
14.3%
a4098395
11.9%
l3673779
10.7%
i3570863
10.4%
t3214451
9.4%
R2129741
 
6.2%
o1399825
 
4.1%
s1361732
 
4.0%
r1140253
 
3.3%
n885881
 
2.6%
Other values (41)7959308
23.2%
Common
ValueCountFrequency (%)
5887517
55.5%
-2112496
 
19.9%
(1168761
 
11.0%
)1168761
 
11.0%
,244443
 
2.3%
'7764
 
0.1%
/5411
 
0.1%
91307
 
< 0.1%
51307
 
< 0.1%
01182
 
< 0.1%
Other values (4)2805
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII44962404
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5887517
13.1%
e4926422
 
11.0%
a4098395
 
9.1%
l3673779
 
8.2%
i3570863
 
7.9%
t3214451
 
7.1%
R2129741
 
4.7%
-2112496
 
4.7%
o1399825
 
3.1%
s1361732
 
3.0%
Other values (55)12587183
28.0%

currency id
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.6 MiB
0.0
2050638 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters6151914
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.02050638
100.0%

Length

2022-04-12T15:45:23.543558image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-12T15:45:23.583823image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0.02050638
100.0%

Most occurring characters

ValueCountFrequency (%)
04101276
66.7%
.2050638
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4101276
66.7%
Other Punctuation2050638
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04101276
100.0%
Other Punctuation
ValueCountFrequency (%)
.2050638
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common6151914
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04101276
66.7%
.2050638
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII6151914
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04101276
66.7%
.2050638
33.3%

name of currency
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct84
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.6 MiB
XOF
270646 
RWF
137746 
INR
137093 
SYP
 
116588
PHP
 
82099
Other values (79)
1306466 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters6151914
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAFN
2nd rowAFN
3rd rowAFN
4th rowAFN
5th rowAFN

Common Values

ValueCountFrequency (%)
XOF270646
 
13.2%
RWF137746
 
6.7%
INR137093
 
6.7%
SYP116588
 
5.7%
PHP82099
 
4.0%
IDR72437
 
3.5%
KGS61188
 
3.0%
BIF60921
 
3.0%
XAF59853
 
2.9%
GMD56971
 
2.8%
Other values (74)995096
48.5%

Length

2022-04-12T15:45:23.621841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
xof270646
 
13.2%
rwf137746
 
6.7%
inr137093
 
6.7%
syp116588
 
5.7%
php82099
 
4.0%
idr72437
 
3.5%
kgs61188
 
3.0%
bif60921
 
3.0%
xaf59853
 
2.9%
gmd56971
 
2.8%
Other values (74)995096
48.5%

Most occurring characters

ValueCountFrequency (%)
F609467
 
9.9%
R476439
 
7.7%
S442070
 
7.2%
P407912
 
6.6%
O374804
 
6.1%
N373245
 
6.1%
D362213
 
5.9%
X342207
 
5.6%
I322785
 
5.2%
M267172
 
4.3%
Other values (16)2173600
35.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter6151914
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F609467
 
9.9%
R476439
 
7.7%
S442070
 
7.2%
P407912
 
6.6%
O374804
 
6.1%
N373245
 
6.1%
D362213
 
5.9%
X342207
 
5.6%
I322785
 
5.2%
M267172
 
4.3%
Other values (16)2173600
35.3%

Most occurring scripts

ValueCountFrequency (%)
Latin6151914
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
F609467
 
9.9%
R476439
 
7.7%
S442070
 
7.2%
P407912
 
6.6%
O374804
 
6.1%
N373245
 
6.1%
D362213
 
5.9%
X342207
 
5.6%
I322785
 
5.2%
M267172
 
4.3%
Other values (16)2173600
35.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII6151914
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F609467
 
9.9%
R476439
 
7.7%
S442070
 
7.2%
P407912
 
6.6%
O374804
 
6.1%
N373245
 
6.1%
D362213
 
5.9%
X342207
 
5.6%
I322785
 
5.2%
M267172
 
4.3%
Other values (16)2173600
35.3%

market type id
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.6 MiB
15
1878421 
14
 
171305
18
 
664
17
 
248

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters4101276
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row15
2nd row15
3rd row15
4th row15
5th row15

Common Values

ValueCountFrequency (%)
151878421
91.6%
14171305
 
8.4%
18664
 
< 0.1%
17248
 
< 0.1%

Length

2022-04-12T15:45:23.681585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-12T15:45:23.721636image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
151878421
91.6%
14171305
 
8.4%
18664
 
< 0.1%
17248
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
12050638
50.0%
51878421
45.8%
4171305
 
4.2%
8664
 
< 0.1%
7248
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4101276
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
12050638
50.0%
51878421
45.8%
4171305
 
4.2%
8664
 
< 0.1%
7248
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common4101276
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
12050638
50.0%
51878421
45.8%
4171305
 
4.2%
8664
 
< 0.1%
7248
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4101276
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
12050638
50.0%
51878421
45.8%
4171305
 
4.2%
8664
 
< 0.1%
7248
 
< 0.1%

market name.1
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.6 MiB
Retail
1878421 
Wholesale
 
171305
Farm Gate
 
664
Producer
 
248

Length

Max length9
Median length6
Mean length6.251825529
Min length6

Characters and Unicode

Total characters12820231
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRetail
2nd rowRetail
3rd rowRetail
4th rowRetail
5th rowRetail

Common Values

ValueCountFrequency (%)
Retail1878421
91.6%
Wholesale171305
 
8.4%
Farm Gate664
 
< 0.1%
Producer248
 
< 0.1%

Length

2022-04-12T15:45:23.772257image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-12T15:45:23.816002image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
retail1878421
91.6%
wholesale171305
 
8.4%
farm664
 
< 0.1%
gate664
 
< 0.1%
producer248
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e2221943
17.3%
l2221031
17.3%
a2051054
16.0%
t1879085
14.7%
R1878421
14.7%
i1878421
14.7%
o171553
 
1.3%
s171305
 
1.3%
h171305
 
1.3%
W171305
 
1.3%
Other values (9)4808
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10768265
84.0%
Uppercase Letter2051302
 
16.0%
Space Separator664
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e2221943
20.6%
l2221031
20.6%
a2051054
19.0%
t1879085
17.5%
i1878421
17.4%
o171553
 
1.6%
s171305
 
1.6%
h171305
 
1.6%
r1160
 
< 0.1%
m664
 
< 0.1%
Other values (3)744
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
R1878421
91.6%
W171305
 
8.4%
F664
 
< 0.1%
G664
 
< 0.1%
P248
 
< 0.1%
Space Separator
ValueCountFrequency (%)
664
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12819567
> 99.9%
Common664
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e2221943
17.3%
l2221031
17.3%
a2051054
16.0%
t1879085
14.7%
R1878421
14.7%
i1878421
14.7%
o171553
 
1.3%
s171305
 
1.3%
h171305
 
1.3%
W171305
 
1.3%
Other values (8)4144
 
< 0.1%
Common
ValueCountFrequency (%)
664
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII12820231
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e2221943
17.3%
l2221031
17.3%
a2051054
16.0%
t1879085
14.7%
R1878421
14.7%
i1878421
14.7%
o171553
 
1.3%
s171305
 
1.3%
h171305
 
1.3%
W171305
 
1.3%
Other values (9)4808
 
< 0.1%

measurement id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct125
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.86923777
Minimum5
Maximum175
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 MiB
2022-04-12T15:45:24.032949image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum5
5-th percentile5
Q15
median5
Q39
95-th percentile69
Maximum175
Range170
Interquartile range (IQR)4

Descriptive statistics

Standard deviation25.98689567
Coefficient of variation (CV)1.747695213
Kurtosis13.42317863
Mean14.86923777
Median Absolute Deviation (MAD)0
Skewness3.53663878
Sum30491424
Variance675.3187466
MonotonicityNot monotonic
2022-04-12T15:45:24.104988image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
51523770
74.3%
15138409
 
6.7%
951612
 
2.5%
3338615
 
1.9%
5117214
 
0.8%
6114848
 
0.7%
2214036
 
0.7%
1713261
 
0.6%
10811793
 
0.6%
3011616
 
0.6%
Other values (115)215464
 
10.5%
ValueCountFrequency (%)
51523770
74.3%
951612
 
2.5%
147634
 
0.4%
15138409
 
6.7%
166622
 
0.3%
1713261
 
0.6%
185293
 
0.3%
19158
 
< 0.1%
201893
 
0.1%
212252
 
0.1%
ValueCountFrequency (%)
17560
 
< 0.1%
17117
 
< 0.1%
170158
 
< 0.1%
1691011
 
< 0.1%
168820
 
< 0.1%
167927
 
< 0.1%
166929
 
< 0.1%
164804
 
< 0.1%
163820
 
< 0.1%
1612630
0.1%

unit of goods measurement
Categorical

HIGH CARDINALITY

Distinct125
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.6 MiB
KG
1523770 
L
 
138409
100 KG
 
51612
Unit
 
38615
Day
 
17214
Other values (120)
281018 

Length

Max length11
Median length2
Mean length2.535615745
Min length1

Characters and Unicode

Total characters5199630
Distinct characters47
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKG
2nd rowKG
3rd rowKG
4th rowKG
5th rowKG

Common Values

ValueCountFrequency (%)
KG1523770
74.3%
L138409
 
6.7%
100 KG51612
 
2.5%
Unit38615
 
1.9%
Day17214
 
0.8%
Head14848
 
0.7%
50 KG14036
 
0.7%
90 KG13261
 
0.6%
46 KG11793
 
0.6%
Pound11616
 
0.6%
Other values (115)215464
 
10.5%

Length

2022-04-12T15:45:24.179708image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
kg1686335
73.3%
l146332
 
6.4%
10057058
 
2.5%
g42697
 
1.9%
unit38615
 
1.7%
pcs19493
 
0.8%
day17214
 
0.7%
5015423
 
0.7%
head14848
 
0.6%
9013261
 
0.6%
Other values (98)249996
 
10.9%

Most occurring characters

ValueCountFrequency (%)
G1735945
33.4%
K1686335
32.4%
250634
 
4.8%
0243595
 
4.7%
L175423
 
3.4%
1130679
 
2.5%
573943
 
1.4%
a65355
 
1.3%
n62350
 
1.2%
U58862
 
1.1%
Other values (37)716509
13.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3784150
72.8%
Decimal Number600107
 
11.5%
Lowercase Letter514227
 
9.9%
Space Separator250634
 
4.8%
Other Punctuation50512
 
1.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a65355
12.7%
n62350
12.1%
i54468
10.6%
t52922
10.3%
e37754
 
7.3%
c31798
 
6.2%
d29122
 
5.7%
o25645
 
5.0%
s25446
 
4.9%
u21848
 
4.2%
Other values (11)107519
20.9%
Uppercase Letter
ValueCountFrequency (%)
G1735945
45.9%
K1686335
44.6%
L175423
 
4.6%
U58862
 
1.6%
D28213
 
0.7%
M24424
 
0.6%
P19232
 
0.5%
H14882
 
0.4%
C14453
 
0.4%
S12774
 
0.3%
Other values (3)13607
 
0.4%
Decimal Number
ValueCountFrequency (%)
0243595
40.6%
1130679
21.8%
573943
 
12.3%
236896
 
6.1%
330192
 
5.0%
428892
 
4.8%
921664
 
3.6%
619491
 
3.2%
87745
 
1.3%
77010
 
1.2%
Other Punctuation
ValueCountFrequency (%)
.40235
79.7%
/10277
 
20.3%
Space Separator
ValueCountFrequency (%)
250634
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4298377
82.7%
Common901253
 
17.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
G1735945
40.4%
K1686335
39.2%
L175423
 
4.1%
a65355
 
1.5%
n62350
 
1.5%
U58862
 
1.4%
i54468
 
1.3%
t52922
 
1.2%
e37754
 
0.9%
c31798
 
0.7%
Other values (24)337165
 
7.8%
Common
ValueCountFrequency (%)
250634
27.8%
0243595
27.0%
1130679
14.5%
573943
 
8.2%
.40235
 
4.5%
236896
 
4.1%
330192
 
3.4%
428892
 
3.2%
921664
 
2.4%
619491
 
2.2%
Other values (3)25032
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII5199630
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
G1735945
33.4%
K1686335
32.4%
250634
 
4.8%
0243595
 
4.7%
L175423
 
3.4%
1130679
 
2.5%
573943
 
1.4%
a65355
 
1.3%
n62350
 
1.2%
U58862
 
1.1%
Other values (37)716509
13.8%

month recorded
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.363020679
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 MiB
2022-04-12T15:45:24.235543image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.403188688
Coefficient of variation (CV)0.5348385397
Kurtosis-1.179686369
Mean6.363020679
Median Absolute Deviation (MAD)3
Skewness0.05451027146
Sum13048252
Variance11.58169325
MonotonicityNot monotonic
2022-04-12T15:45:24.287136image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
6182132
8.9%
5181862
8.9%
3179884
8.8%
7177613
8.7%
4176659
8.6%
2173404
8.5%
1172090
8.4%
8166226
8.1%
10165525
8.1%
9164928
8.0%
Other values (2)310315
15.1%
ValueCountFrequency (%)
1172090
8.4%
2173404
8.5%
3179884
8.8%
4176659
8.6%
5181862
8.9%
6182132
8.9%
7177613
8.7%
8166226
8.1%
9164928
8.0%
10165525
8.1%
ValueCountFrequency (%)
12154798
7.5%
11155517
7.6%
10165525
8.1%
9164928
8.0%
8166226
8.1%
7177613
8.7%
6182132
8.9%
5181862
8.9%
4176659
8.6%
3179884
8.8%

year recorded
Real number (ℝ≥0)

HIGH CORRELATION

Distinct32
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2016.130844
Minimum1990
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.6 MiB
2022-04-12T15:45:24.347370image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1990
5-th percentile2008
Q12014
median2017
Q32020
95-th percentile2021
Maximum2021
Range31
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.458825267
Coefficient of variation (CV)0.002211575346
Kurtosis2.220083238
Mean2016.130844
Median Absolute Deviation (MAD)3
Skewness-1.344576832
Sum4134354521
Variance19.88112276
MonotonicityNot monotonic
2022-04-12T15:45:24.411834image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=32)
ValueCountFrequency (%)
2020395781
19.3%
2021203084
9.9%
2019202032
9.9%
2018183970
9.0%
2017173234
8.4%
2016147333
 
7.2%
2015135859
 
6.6%
2014121016
 
5.9%
2013107881
 
5.3%
201286311
 
4.2%
Other values (22)294137
14.3%
ValueCountFrequency (%)
1990140
 
< 0.1%
1991134
 
< 0.1%
1992280
 
< 0.1%
1993284
 
< 0.1%
19941537
0.1%
19951287
0.1%
19962022
0.1%
19971638
0.1%
19981792
0.1%
19991713
0.1%
ValueCountFrequency (%)
2021203084
9.9%
2020395781
19.3%
2019202032
9.9%
2018183970
9.0%
2017173234
8.4%
2016147333
 
7.2%
2015135859
 
6.6%
2014121016
 
5.9%
2013107881
 
5.3%
201286311
 
4.2%

price paid
Real number (ℝ≥0)

SKEWED

Distinct239811
Distinct (%)11.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6413.983952
Minimum0
Maximum21777780
Zeros34
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size15.6 MiB
2022-04-12T15:45:24.487741image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q145
median246.55585
Q31200
95-th percentile22000
Maximum21777780
Range21777780
Interquartile range (IQR)1155

Descriptive statistics

Standard deviation106977.235
Coefficient of variation (CV)16.67875002
Kurtosis14317.0661
Mean6413.983952
Median Absolute Deviation (MAD)232.69585
Skewness107.5510841
Sum1.315275922 × 1010
Variance1.144412881 × 1010
MonotonicityNot monotonic
2022-04-12T15:45:24.561036image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20023571
 
1.1%
50022600
 
1.1%
30021174
 
1.0%
40018858
 
0.9%
100018673
 
0.9%
25018153
 
0.9%
10015307
 
0.7%
5014465
 
0.7%
35013828
 
0.7%
15013606
 
0.7%
Other values (239801)1870403
91.2%
ValueCountFrequency (%)
034
< 0.1%
0.014
 
< 0.1%
0.01251
 
< 0.1%
0.021
 
< 0.1%
0.091
 
< 0.1%
0.128
< 0.1%
0.10012
 
< 0.1%
0.10051
 
< 0.1%
0.1051
 
< 0.1%
0.10781
 
< 0.1%
ValueCountFrequency (%)
217777801
 
< 0.1%
197777771
 
< 0.1%
186666661
 
< 0.1%
172500001
 
< 0.1%
172000001
 
< 0.1%
1700000010
< 0.1%
162500001
 
< 0.1%
160000001
 
< 0.1%
150000004
 
< 0.1%
148000001
 
< 0.1%

mp_commoditysource
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing2050638
Missing (%)100.0%
Memory size15.6 MiB

Interactions

2022-04-12T15:45:11.618306image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:53.059369image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:56.954567image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:59.328385image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:01.719070image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:04.280421image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:06.715372image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:09.217739image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:11.927247image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:53.405703image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:57.248972image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:59.624435image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:02.013207image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:04.594852image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:07.032296image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:09.519213image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:12.228886image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:53.718927image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:57.546188image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:59.913545image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:02.302285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:04.886651image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:07.344106image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:09.811962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:12.529432image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:55.402398image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:57.838305image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:00.221152image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:02.591130image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:05.181300image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:07.656611image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:10.104917image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:12.865038image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:55.708661image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:58.122930image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:00.510460image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:02.882860image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:05.471732image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:07.962841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:10.398953image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:13.173376image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:56.023221image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:58.433264image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:00.811587image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:03.181780image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:05.791626image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:08.279550image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:10.709533image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:13.472367image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:56.334374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:58.721866image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:01.110552image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:03.475185image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:06.084133image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:08.590747image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:11.002813image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:13.759349image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:56.642501image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:44:59.028212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:01.418676image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:03.790081image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:06.392360image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:08.915947image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-04-12T15:45:11.317734image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-04-12T15:45:24.632636image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-04-12T15:45:24.748049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-04-12T15:45:24.866207image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-04-12T15:45:24.976214image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-04-12T15:45:25.060547image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-04-12T15:45:14.430440image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-04-12T15:45:16.987937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-04-12T15:45:21.486807image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

country idcountry namelocality idlocality namemarket idmarket namecommodity purchase idcommodity purchasedcurrency idname of currencymarket type idmarket name.1measurement idunit of goods measurementmonth recordedyear recordedprice paidmp_commoditysource
01.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG1201450.0NaN
11.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG2201450.0NaN
21.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG3201450.0NaN
31.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG4201450.0NaN
41.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG5201450.0NaN
51.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG6201450.0NaN
61.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG7201450.0NaN
71.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG8201450.0NaN
81.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG9201450.0NaN
91.0Afghanistan272Badakhshan266Fayzabad55Bread - Retail0.0AFN15Retail5KG10201450.0NaN

Last rows

country idcountry namelocality idlocality namemarket idmarket namecommodity purchase idcommodity purchasedcurrency idname of currencymarket type idmarket name.1measurement idunit of goods measurementmonth recordedyear recordedprice paidmp_commoditysource
2050628271.0Zimbabwe3444Midlands5594Mbilashaba52Rice - Retail0.0ZWL15Retail5KG62021110.6250NaN
2050629271.0Zimbabwe3444Midlands5594Mbilashaba54Maize meal - Retail0.0ZWL15Retail5KG6202150.0000NaN
2050630271.0Zimbabwe3444Midlands5594Mbilashaba96Oil (vegetable) - Retail0.0ZWL15Retail15L62021197.0000NaN
2050631271.0Zimbabwe3444Midlands5594Mbilashaba97Sugar - Retail0.0ZWL15Retail5KG62021118.3750NaN
2050632271.0Zimbabwe3444Midlands5594Mbilashaba185Salt - Retail0.0ZWL15Retail5KG6202171.0000NaN
2050633271.0Zimbabwe3444Midlands5594Mbilashaba432Beans (sugar) - Retail0.0ZWL15Retail5KG62021233.3333NaN
2050634271.0Zimbabwe3444Midlands5594Mbilashaba539Toothpaste - Retail0.0ZWL15Retail116100 ML62021112.5000NaN
2050635271.0Zimbabwe3444Midlands5594Mbilashaba540Laundry soap - Retail0.0ZWL15Retail5KG62021114.0000NaN
2050636271.0Zimbabwe3444Midlands5594Mbilashaba541Handwash soap - Retail0.0ZWL15Retail66250 G6202159.5000NaN
2050637271.0Zimbabwe3444Midlands5594Mbilashaba887Fish (kapenta) - Retail0.0ZWL15Retail5KG620211200.0000NaN